The Plant Genome — Latest Matching Preprints

1

Genome-Wide Markers Predict Metribuzin Tolerance in Southern Soft Red Winter Wheat

Sellani, J.; Anzueto, H.; Arcenaux, K.; Price, P. T.; Brown-Guedira, G.; Harrison, S.; DeWitt, N.

2026-07-03 genomics 10.64898/2026.06.28.733875 medRxiv

Top 0.1%

33.9%

Show abstract

Metribuzin is a versatile herbicide effective against various annual grasses and broadleaf weeds found in wheat fields. However, it can cause foliar damage to wheat, impacting plant health and yield. A clearer understanding of the genetic architecture associated with metribuzin tolerance is necessary to guide marker-based breeding strategies. This study evaluated 351 historic Gulf Atlantic Wheat Nursery (GAWN) wheat breeding lines representative of southern US soft red winter wheat (SRWW) germplasm. Field trials were conducted at Winnsboro (WN) and Baton Rouge (BR), Louisiana, in 2016 and 2017. Metribuzin was applied at specific growth stages[DN1.1], and tolerance was assessed based on visual foliar damage. Genomic data from 6,252 filtered single nucleotide polymorphism (SNP) markers were used to estimate narrow-sense heritability, conduct genome-wide association (GWAS), and assess genomic prediction accuracy using genomic best linear unbiased prediction (GBLUP). Broad-sense heritability ranged from 0.54 to 0.69 within environments and reached 0.77 across environments, while narrow-sense heritability ranged from 0.35 to 0.47, indicating moderate additive genetic control. No SNP surpassed the significance threshold, but genomic prediction (GP) showed moderate to strong predictive ability (PA) across environments, with the highest accuracy (r = 0.62) observed between BR17 and WN17. These results indicate that metribuzin tolerance in SRWW is primarily controlled by multiple small-effect loci and that GS provides a more effective breeding strategy than marker-assisted selection for improving tolerance in southern wheat germplasm.

2

Comparison of localGEBV and Optimal Haplotype Stacking Fitness Functions using a Novel R Package: HapSelect

Shaffer, W.; Papin, V.; Carter, Z.; Brunner, S. M.; Tong, J.; Villiers, K.; Robinson, H.; Voss-Fels, K.; Hayes, B. J.; Hickey, L.; Dinglasan, E.

2026-07-13 genetics 10.64898/2026.07.08.737160 medRxiv

Top 0.1%

30.7%

Show abstract

Haplotype-based breeding strategies have emerged as promising approaches to maximize long-term genetic gain by identifying complementary parental combinations while maintaining genetic diversity. However, these methods typically require phased genotypes and more intensive workflow pipelines and skillsets. We developed a novel local genomic estimated breeding value (localGEBV) fitness function with similar intent to the optimal haplotype stacking (OHS) framework fitness function and implemented both in the novel R package, HapSelect. Our aim was to evaluate whether phased haplotypes provide additional benefit over the more easily available dosage-based unphased genotypes in highly inbred crops. A subset of bread wheat nested association mapping (NAM) population comprising 444 lines genotyped with 6,054 DArT-Seq markers was analysed. Marker effects were estimated using rrBLUP, localGEBV and haplotype effects were calculated across linkage disequilibrium-defined haploblocks, and genetic algorithms (GA) were used to identify optimal sets of 30 founders using either a localGEBV derived fitness function with unphased, dosage inputs or the OHS fitness function with phased inputs. Selected parental sets were compared with conventional truncation selection (TS) through 150 generations of forward simulation. The OHS fitness function achieved a marginally greater optimized ultimate GEBV than the localGEBV fitness function during GA optimization, with only 18 of the 30 selected founders overlapped between the two methods. Despite these differences, forward simulations demonstrated nearly identical long-term genetic gain for localGEBV and OHS-selected founders, with both approaches outperforming conventional truncation selection by maintaining greater genetic diversity and delaying the genetic plateau. The minimal difference between localGEBV and OHS is likely attributable to the high homozygosity of the population, where localGEBV and haplotype effects are nearly confounded. These results demonstrate that dosage-based localGEBV provides a practical alternative to phased haplotype approaches for parent selection in inbred crops, substantially simplifying genomic workflows while maintaining long-term breeding performance. Future work should evaluate these methods in more diverse inbred populations and outbred species, where great haplotypic diversity may increase the advantage of true haplotype-based optimizations.

3

Knowledge-guided Bayesian optimization using pre-trained LLMs speeds up the identification of superior genotypes from germplasm collection

Hamazaki, K.; Tsuda, K.

2026-07-02 bioinformatics 10.64898/2026.06.28.735149 medRxiv

Top 0.1%

18.6%

Show abstract

Background: Germplasm collections contain wide genetic diversity that is valuable for plant breeding, but conducting phenotypic evaluation for all genotypes in field trials is rarely feasible. Bayesian optimization offers a way to decide, season by season, which genotypes to cultivate in order to identify superior genotypes with fewer evaluations. However, standard Bayesian optimization commonly starts from randomly selected genotypes and mainly relies on surrogate models built from marker genotype information, while the text-based passport information that accompanies germplasm is not fully used. We examined whether pre-trained large language models can provide prior knowledge that improves these decisions in germplasm evaluation. Results: We constructed a large-language-model-guided Bayesian optimization framework that introduces large language models into two parts of the Bayesian optimization workflow. In zero-shot warmstarting, a large language model proposes initial genotypes using passport information such as cultivar name, country of origin, and subpopulation, optionally together with principal component scores derived from genome-wide single-nucleotide-polymorphism markers. In addition, we evaluated a large-language-model-based surrogate model that predicts phenotypic values for untested genotypes using in-context learning from previously evaluated genotypes. Using a rice germplasm panel and two target traits (seed number per panicle for maximization and protein content for minimization), we compared strategies. For seed number per panicle, zero-shot warmstarting with a general-purpose instruction-following model reduced the number of evaluated genotypes needed to reach the best genotype, whereas improvements were small for protein content. When genomic information was available, Gaussian-process-based Bayesian optimization was the strongest overall approach, while the large-language-model-based surrogate model outperformed random baselines and was competitive in some settings. When genomic information was not available, predictions based on passport information improved efficiency compared with fully random strategies. Conclusions: Pre-trained large language models can inject useful agronomic knowledge into Bayesian optimization for germplasm evaluation, particularly by improving early-stage genotype selection, and can also support optimization when genomic information is unavailable. As models better handle long genomic sequences together with passport information, large-language-model-guided Bayesian optimization may become a practical and explainable decision-support approach for agricultural optimization.

4

Haplotypes variations of yellow stripe like (TaYSL) genes are associated with grain iron and zinc contents in wheat (Triticum aestivum L.)

Abbasi, K.; Qayyum, H.; Naseer, S.; Sun, M.; Quraishi, M. A.; Danyal, Y.; Hao, Y.; He, Z.; Rasheed, A.

2026-07-08 plant biology 10.64898/2026.06.17.732851 medRxiv

Top 0.1%

13.0%

Show abstract

The availability of pangenome and resequencing of wheat collections have facilitated the discovery of gene-trait associations in wheat. Yellow stripe-like (YSL) proteins play a key role in the uptake and translocation of metals and yet have not been fully identified and analyzed at the genome-wide level in wheat. In this study, 26 TaYSL genes were identified and divided into four distinct clades, each clade sharing similar domains and motif compositions. Most genes were upregulated under iron deficiency, whereas homoeologs of TaYSL1 were downregulated. Both SNP-based and haplotype-based association studies were used to dissect the role of TaYSLs underpinning grain iron contents (GFeC) and zinc contents (GZnC) in wheat. TaYSL6-2B and TaYSL16-1A haplotypes showed strong association with GFeC, and TaYSL14-6A showed strong association with GZnC in multiple field trials. The distribution of favorable haplotypes in global wheat collection of [~]3000 accessions showed that majority of haplotypes were more prevalent in landraces and winter wheat compared to modern cultivars and spring types, indicating their potential for use in breeding. The combination of favorable haplotypes of three YSL genes associated with GFeC and GZnC were very rare, and most of the wheat accessions has single or double favorable haplotypes. These findings provide the first comprehensive characterization of the TaYSL gene family in wheat and identify significant SNPs and elite haplotypes that can be utilized for genetic improvement and biofortification.

5

From Phenomics to Genomics: Macro-GWAS of Almond Morphology and Quality

Mas Gomez, J.; Rubio Angulo, M.; Duval, H.; Dicenta, F.; Martinez-Garcia, P. J.

2026-07-07 plant biology 10.64898/2026.07.06.736816 medRxiv

Top 0.1%

12.7%

Show abstract

In plant breeding and genetics, recent advances in high-throughput phenotyping are beginning to meet the growing demand for large-scale, high-quality phenotypic data that emerged after the development of next-generation sequencing technologies. Recent developments in phenomics have been incorporated into almond breeding programs, facilitating the large-scale acquisition of quantitative phenotypes and the dissection of the genetic architecture underlying morphological and quality-related traits. The implementation of a high-throughput phenotyping platform integrating RGB and hyperspectral imaging with genotyping using the 60K almond SNP array enabled the large-scale characterization of almond populations and the identification of 567 robust marker-trait associations across 66 traits. These analyses revealed two major genomic hotspots on chromosomes 2 and 5 associated with morphological and quality-related traits. These regions harbored biologically relevant candidate genes, including genes associated with OVATE family proteins, brassinosteroid signaling, protein ubiquitination, and acyl-CoA metabolism, as well as other regulators of organ growth, cell proliferation, hormone signaling, and seed development. Furthermore, a novel candidate gene encoding a COMT-like O-methyltransferase involved in lignin biosynthesis was identified and proposed to contribute to shell hardness, a major genetically controlled trait in almond. Together, these findings demonstrate the potential of integrating high-throughput phenomics and genomics to dissect complex traits, identify candidate genes, and accelerate genomics-informed breeding in almond.

6

A genetic toolkit to reduce wheat immunogenicity and incidence of celiac disease

Rottersman, M. G.; Laudencia-Chingcuanco, D.; Zhang, W.; Guzman-Lopez, M. H.; Lin, J. W.; Zhang, J.; Caseys, C.; Burguener, G.; Kim, S.; Zhang, X.; Yunusbaev, U.; Akhunov, E.; Lee, J.-Y.; Dubcovsky, J.

2026-07-08 plant biology 10.64898/2026.06.23.734071 medRxiv

Top 0.1%

11.9%

Show abstract

Celiac disease (CeD) is an immune-mediated condition triggered by wheat gluten in genetically predisposed individuals. The immune reaction in people with CeD is driven by particular gluten amino acid sequences, or immunogenic epitopes. Some of these epitopes elicit strong immune responses in the majority of CeD patients and are designated as immunodominant epitopes. Previous research has shown correlations between the amount of immunogenic wheat epitopes consumed and the onset of CeD, suggesting that reducing wheat immunogenic epitopes may reduce CeD incidence at the population level. Gluten consists of gliadins and glutenins, with gliadins having the majority of the immunodominant epitopes and glutenins playing a major role in dough strength and breadmaking quality (BMQ). This study used radiation-induced deletions, chemical mutagenesis, and natural variation in wheat (Triticum aestivum) to generate genetic stocks with reduced immunogenic epitope content. Most lines were developed in the wheat cultivar Summit, for which we produced a full genome assembly and annotation. We used exome capture to characterize these deletions and identify prolamins located within and outside the deletions. We combined different deletions and developed molecular markers to facilitate their deployment. For chromosome arms 1BS and 1DS, we generated two alternative lines: one lacking immunogenic epitopes for the development of CeD-safe genetic stocks for research purposes, and another retaining selected glutenins for breeding commercial lines with reduced immunogenicity and adequate BMQ. By making these non-transgenic genetic stocks publicly available, we aim to accelerate the development of wheat varieties with reduced immunogenicity and, eventually, a fully CeD-safe wheat.

7

Enhancing predictive accuracy of yield traits in cassava through multi-trait genomic prediction

de Freitas, G. M.; Certuche, D. S.; Jannink, J.-L.; de Oliveira, E. J.; Garcia, A. A. F.

2026-07-06 genetics 10.64898/2026.07.01.735838 medRxiv

Top 0.1%

11.4%

Show abstract

Multi-trait genomic prediction offers a practical route to improve selection for costly, complex traits in clonally propagated crops such as cassava. In a Brazilian breeding panel of 1,078 cassava clones genotyped with 25,923 SNPs and phenotyped for six agronomic traits, we compared single-trait (ST) and multi-trait (MT) GBLUP models. Stage-wise mixed models produced BLUEs that fed into ST and MT-GBLUP. We tested five cross-validation schemes that mimic breeder realities: ST baseline (CV1); naive all-traits MT prediction for unphenotyped candidates (CV2); MT prediction using auxiliary trait phenotypes in the test set (CV3); and two sparse-phenotyping regimes with missingness by trait (CV4) or by clone (CV5) at 25%, 50%, and 75% levels. The main results were that, under the ST baseline (CV1), predictive ability ranged from 0.50 for DMC and 0.45 for FRY down to 0.13 for Le.Dis. A naive full MT model (CV2) performed approximately on par with ST-GBLUP. In contrast, MT designs (CV3) that included informative auxiliary traits, such as shoot yield and combinations with plant vigor and leaf disease severity, yielded small gains for DMC with predictive ability of approximately 0.51 (+2%), while FRY predictive ability increased to approximately 0.65 (+44%), accompanied by RMSE reductions for FRY up to approximately 13.5% (e.g. RMSE approximately 6.2). Sparse-phenotyping simulations (CV4/CV5) demonstrated that MT models sustain or even improve predictive ability under realistic missing-data regimes (PA {approx} 0.62 - 0.65). Selection concordance between MT and ST top-10% sets was generally high (>0.80), and MT configurations produced measurable improvements in expected selection response and genetic gain per cycle for several target traits. These results indicate that strategically implemented MT-GBLUP, using a small set of biologically and operationally informative auxiliary traits and optimized sparse phenotyping, can materially increase predictive accuracy and selection efciency for economically critical cassava traits while reducing phenotyping burden.

8

Multi-trait evaluation of a tomato MAGIC population identifies promising lines with improved nitrogen use efficiency (NUE)

Baraja-Fonseca, V.; Gil-Villar, D.; Bancic, J.; Renau-Morata, B.; Salud Justamante, M.; Plazas, M.; Gramazio, P.; Vilanova, S.; Perez-Perez, J. M.; Granell, A.; Molina, R. V.; Nebauer, S. G.; Prohens, J.; Arrones, A.

2026-07-15 plant biology 10.64898/2026.07.14.738388 medRxiv

Top 0.1%

11.3%

Show abstract

Nitrogen-use efficiency (NUE) is a pivotal breeding target in tomato (Solanum lycopersicum L.) to sustain production under reduced N inputs. Here, we leveraged a recently developed tomato multi-parent advanced generation inter-cross (ToMAGIC) population to identify lines with superior performance under reduced N availability. The eight founders and a core subset of 118 ToMAGIC lines were characterized with 10,684 SNP markers and evaluated under optimal (opN, 15 mM) and suboptimal (subN, 8 mM) N supply in an experiment totalling 1,576 plants, generating 48,068 data points across 61 phenotypic variables. Under both N treatments, ToMAGIC lines exhibited transgressive segregation for most traits, confirming the value of this population as a reservoir of untapped variation. Notably, under subN conditions, harvest index (Hi) increased by 29-44%, suggesting adaptive resource redistribution toward reproductive sinks. Variance partitioning revealed that agronomic and NUE-related traits were largely under genetic control, with heritability estimates frequently above 0.80 and broadly conserved across N treatments. Multivariate trait analysis identified fruit yield N concentration (NUE component, CN,y), shoot biomass N content (NAb), and shoot growth-related traits as the main drivers of treatment differentiation. Finally, proxy traits were prioritized by integrating response magnitude, heritability, trait correlations, and treatment-discriminatory power into multi-trait selection indices. This strategy generated favorable predicted genetic gains, reaching 158% for high-performance lines and 170% for subN-adapted lines, and consistently identified lines 402, 428, 518, 800, and 816 as promising pre-breeding materials. Overall, this study supports ToMAGIC as a powerful resource for developing N-efficient cultivars suited for sustainable agriculture.

9

Novel quantitative trait loci conferring broad-based resistance to root-knot nematodes in lima bean (Phaseolus lunatus)

Tajima, A. M.; Matthews, W. C.; Duong, T.; Khanh, T. D.; Baniya, A.; Penmetsa, R. V.; Parker, T.; Farmer, A.; English, S.; Diepenbrock, C.; Gepts, P.; Roberts, P. A.; Huynh, B.-L.

2026-07-09 plant biology 10.64898/2026.06.30.735594 medRxiv

Top 0.2%

7.2%

Show abstract

Lima bean (Phaseolus lunatus) is a broadly adapted, economically important leguminous crop and a susceptible host of root-knot nematodes (Meloidogyne spp.; RKN), which are a devastating plant pathogen in agricultural systems worldwide. To date, there have been few studies to elucidate the genetic determinants of RKN resistance in lima beans. Understanding the genetic mechanisms underlying resistance is essential for improving resistance traits and incorporating them into lima bean breeding programs. To assist in marker-assisted selection, we aimed to identify and map quantitative trait loci (QTLs) conferring RKN resistance-related traits. Three recombinant inbred line (RIL) populations were used in this study. Three populations were derived by crossing two RKN-resistant parents with the same RKN-susceptible parent and with each other. All populations were genotyped using genome-wide single-nucleotide polymorphism (SNP) markers. Each population was screened for root galling (RG) and RKN egg reproduction (ER) in response to M. incognita and M. javanica in greenhouse experiments. Three major QTLs were detected and mapped on chromosome Pl04 (QRk-pl04.1), Pl05 (QRk-pl05.1) and Pl10 (QRk-pl10.1) across populations. Among them, QRk-pl05.1 and QRk-pl10.1 affected levels of RG and ER of both RKN species, while QRk-pl04.1 suppressed root galling and reproduction responses of M. incognita but not of M. javanica. These chromosomal regions defined by flanking markers will help guide marker-assisted breeding and gene discovery for broad-based RKN resistance in lima beans.

10

Diversity Assessment with SNP, SSR, AFLP, and RAPD Markers in Plants: A Systematic Review and Meta-Analysis

Olagunju, Y. O.; Olawuyi, O. J.

2026-07-07 plant biology 10.64898/2026.07.03.736291 medRxiv

Top 0.2%

5.4%

Show abstract

Background. DNA-based molecular markers underpin plant genetic diversity assessment, germplasm characterisation, and conservation prioritisation. Four marker systems dominate the field: Amplified Fragment Length polymorphisms (AFLPs), simple sequence repeats (SSRs), single nucleotide polymorphisms (SNPs), and random amplified polymorphic DNA (RAPDs). No quantitative meta-analysis had pooled their performance on the canonical diversity metrics: polymorphism information content (PIC), expected heterozygosity (He), and resolution power, across plants. Existing reviews are narrative, marker-restricted, or qualitatively conclusive of infeasibility. Methods. A PRISMA 2020-compliant systematic review (registered at the Open Science Framework) was executed. Eligible studies were within-study paired comparisons genotyping the same accession panel with at least two of {SNP, SSR, AFLP, RAPD} and reporting at least one diversity metric. Effect sizes were paired standardised mean differences (Hedges' g) computed under the Bernoulli-variance approximation. Random-effects REML meta-analysis used metafor 5.0.1 with Knapp-Hartung adjustment, leave-one-out, and r-sensitivity. Results. Fifteen within-study paired contrasts were eligible, distributed across three pools. Pool 2 (SSR vs SNP, He, k = 5) yielded a pooled Hedges' g of 0.494 (95% CI: -0.078 to 1.066, p = 0.075; I-squared = 90.2%; 95% PI [-0.82, 1.81]). SSRs exceeded SNPs on He in 4 of 5 studies; leave-one-out removal of the panel-size-asymmetric outlier raised the estimate to g = 0.644 (p = 0.025). Pool 3a (dominant-marker stratum, k = 6) yielded g = 0.419 (95% CI: -0.121 to 0.960, p = 0.103; I-squared = 56.5%); five of six contrasts showed SSR or AFLP exceeding RAPD on per-locus PIC. Pool 1 (PIC, k = 3, exploratory) gave a consistent direction (g = 0.453). All three pools point in the same direction: codominant or AFLP markers carry more per-locus information than the alternative being compared. Conclusions. SSR markers reported higher per-locus diversity than SNP and RAPD markers in plant within-study paired comparisons, mechanistically grounded in the SNP biallelic ceiling and the multi-allelic richness of SSRs. The effect attenuated or reversed in selfing/low-diversity panels and at the per-panel level when SNP panels exceeded approximately 1000 loci. RAPDs show the lowest per-locus information content of the four classes.

11

Integrated pangenome and population genomics reveal selection on standing genetic variation driving fiber flax-linseed divergence

You, F. M.; Zheng, C.; Edwards, T.; Li, P.; Rashid, K. Y.; Duguid, S. D.; Booker, H.; Cloutier, S.

2026-07-14 genomics 10.64898/2026.07.09.737549 medRxiv

Top 0.2%

5.3%

Show abstract

Flax (Linum usitatissimum L.) has been domesticated for dual end uses as linseed and fiber flax, yet the genomic basis of morphotype divergence remains unclear. Here, we constructed a morphotype-resolved pangenome by integrating three newly generated near telomere-to-telomere genome assemblies with 14 previously published ones. Despite substantial variation in assembly size, driven primarily by DNA transposons, gene content was highly conserved, with little evidence for significant morphotype-specific gene presence-absence variation. Population genomic analyses of 407 accessions revealed that fiber flax had reduced nucleotide diversity, extended linkage disequilibrium, and a more compact population structure relative to linseed, consistent with stronger selection and a narrower genetic base. Genome-wide differentiation was heterogeneous and concentrated in discrete regions. Integration of FST, nucleotide diversity ratios, Tajimas D, and genome-wide association signals identified morphotype-enriched genomic blocks distributed across the genome. Many candidate regions are primarily supported by directional shifts in nucleotide diversity rather than extreme differentiation, indicating selection on standing genetic variation. Genome-wide association analyses identified 1,712 unique quantitative trait nucleotides (QTNs), with predominantly small effect sizes and strong enrichment in gene-proximal regions, consistent with a polygenic architecture. Overall, fiber flax traits tend to be controlled by fewer loci with moderate-to-large effects, whereas linseed traits exhibit a more diffuse genetic architecture. Patterns of Tajimas D further support non-classical selection dynamics, with predominantly positive values in linseed and localized negative values in fiber flax, consistent with selection on standing genetic variation. Together, our results suggest that flax morphotype divergence is driven primarily by selection on pre-existing allelic variation within a conserved gene repertoire. This study provides a comprehensive framework linking genome structure, population genomics, and trait architecture, and highlights the importance of standing genetic variation as a key resource for flax breeding and improvement.

12

Pan-genomic and pan-transcriptomic analysis of the Heavy Metal ATPase family reveals diverse expression patterns and functional roles in barley

Shadbolt, J.; Schreiber, M.; Russell, J.; Waugh, R.; Houston, K.

2026-07-08 plant biology 10.64898/2026.07.07.736986 medRxiv

Top 0.3%

4.2%

Show abstract

Heavy metals act as essential metalloprotein cofactors in numerous physiological processes but can become toxic when non-essential metals accumulate or when essential metals are in excess. As plants continuously encounter heavy metals through their roots, they have evolved complex homeostatic mechanisms to regulate metal uptake and distribution. The Heavy Metal ATPase (HMA) gene family encodes a group of heavy metal transporting P-type ATPases that have been linked to stress resistance and nutrient supply. Here, we used a bioinformatics approach to identify and characterise 13 HMA genes containing characteristic P1B-type ATPase domains and motifs in the barley Morex V3 reference genome. The genes are located on five of the seven barley chromosomes. Phylogenetic analysis revealed that they cluster into five sub-clades, including one clade unique to barley. Expression profiling across multiple datasets showed distinct temporal and tissue-specific expression patterns among HvHMAs, with several members exhibiting significant transcriptional responses to specific biotic and abiotic stresses. By utilising recently available pan-transcriptomic and pan-genomic resources, we have identified substantial allelic diversity and inter-accession variation in HvHMAs. Our findings suggest that HvHMAs have functions extending beyond canonical heavy metal homeostasis and warrant further investigation for their potential roles in broader physiological and stress-related processes.

13

Introducing PHJ Media: A Unique Machine Learning -Driven Basal Formulation to Overcome Recalcitrance for Multi-Genotype Micropropagation of Cannabis sativa L.

Pepe, M.; Hesami, M.; Jones, M.

2026-07-15 plant biology 10.64898/2026.07.14.738465 medRxiv

Top 0.3%

4.2%

Show abstract

Applications of tissue culture are critical for Cannabis sativa L. (cannabis), supporting clonal propagation, germplasm preservation, pathogen elimination, among other biotechnological applications. However, extensive genetic diversity associated with cannabis results in highly variable responses to in vitro conditioning, and no consensus basal media formulation exists to support reproducible micropropagation across genotypes. To address these limitations, a hybridized ensemble-NSGA-II approach was employed for concurrent optimization of individual media components to create a species specific, cultivar inclusive basal salt formulation for cannabis micropropagation. The resulting PHJ media represents a unique formulation that overcomes recalcitrance across a wide array of cannabis cultivars, facilitating improved growth and uniformity for the nine cultivars used in its development and validation. These results remain consistent from explant initiation through multiple rounds of subculture. The ability of PHJ to overcome genotypic recalcitrance is telling of its potential applicability with an array of plant species beyond cannabis. Additionally, robust performance both with and without plant growth regulators underscores the plausible use of PHJ for diverse applications beyond standard micropropagation. Ultimately, this cultivar-inclusive basal medium demonstrates utility for both scientific research and industrial-scale operations.

14

Evolutionary dynamics of Aegilops revealed through comparative genome assembly of all 25 species

Shazadee, H.; Edwards, T.; Levesque-Lemay, M.; Zheng, C.; Ens, J.; Pozniak, C. J.; You, F. M.; Cloutier, S.

2026-07-10 genomics 10.64898/2026.07.09.737531 medRxiv

Top 0.3%

3.5%

Show abstract

Aegilops species are the closest wild relatives of wheat and an important reservoir of genetic diversity for its improvement. Despite their potential, many Aegilops genomes remain poorly characterized. Here we present high-quality assemblies of 18 diploid, tetraploid, and hexaploid Aegilops genomes, which, along with the previously published genomes, complete the production of reference assemblies for all 25 genomes in this genus. Assembly sizes ranged from 5.24 Gb in diploids to 12.65 Gb in hexaploids, with scaffold N50 values up to 749.2 Mb. Gene annotation identified 53,035-156,779 protein-coding genes, of which 21,865-60,490 were classified as high-confidence. Orthogroup-based pangenome analysis across the 25 Aegilops genomes identified 80,521 orthogroups, including 15,809 core, 61,735 dispensable, and 2,977 species-specific orthogroups, highlighting substantial gene content variation among genomes. Phylogenetic analysis of 63 Triticum and Aegilops genomes/subgenomes based on near single-copy orthologs defines the phylogenetic relationships within the Triticum/Aegilops complex and confirms diploid progenitors of polyploid lineages. Ae. mutica (T) and Ae. speltoides (S) belong to the B lineage while the remaining Sitopsis grouped within the D lineage. Structural variation analyses using diploid progenitors as references revealed extensive large-scale rearrangements following polyploidization, emphasizing the dynamics of their evolution. Transposable element (TE) annotation further highlighted subgenome-specific TE expansions and contractions, providing insights into the mechanisms shaping genome structure after polyploidization. Collectively, these genomic resources provide a comprehensive framework for exploring Aegilops diversity, understanding polyploid evolution, and accelerating wheat improvement.

15

High-throughput stomatal phenotyping provides selection targets for stress-resilient wheat

Mabrouk, M.; Russell, N. J.; Alegria, E. V.; Wang, T.-C.; Liang, J.-A.; Wu, F.-J.; Huang, Y.; Wittkop, B.; Snowdon, R.; Förter, L.; Moritz, A.; Herzog, E.; Ganji, E.; Wehner, G.; Stahl, A.; Chen, T.-W.

2026-07-13 plant biology 10.64898/2026.07.10.737162 medRxiv

Top 0.4%

3.4%

Show abstract

Phenotyping stomatal traits and their developmental plasticity is time-consuming but holds potential to improve water use efficiency and photosynthesis for designing stress-tolerant crops under climate change. Here, we develop a robust, high-throughput pipeline for phenotyping 14 stomatal traits in winter wheat related to size, variation, maximum conductance, and spatial patterning. We (1) analyze over 25,000 images from 60 wheat cultivars grown in growth chamber, greenhouse, and field conditions; (2) investigate the impact of light, temperature, and reduced water and nitrogen supply on stomatal traits and their developmental plasticity across adaxial and abaxial surfaces; and (3) evaluate genetic diversity and breeding progress of stomatal traits. Stomatal traits were highly broad-sense heritable, were largely plastic in response to environmental conditions, and showed genotype-specific responses. Stomatal traits of third leaves under controlled environments with stable light and temperature conditions reliably captured the genetic variance of flag leaves under field conditions. Our data suggests that the upper leaf surface contributed more to transpiration and cooling through consistently higher stomatal density, area, and maximum conductance, while the lower surface facilitated CO2 diffusion via systematic proper patterning and spacing. Breeding maintains the genetic diversity of stomatal traits, and our pipeline facilitates breeders to target them to enhance water use efficiency in high-yielding modern cultivars.

16

VigExp: A functionally verified platform for aiding cowpea (Vigna unguiculata) and related legume crop improvement

Su, H.; Mazurkiewicz, D.; Gursanscky, N.; Riboni, M.; Juranic, M.; Johnson, S. D.; Yow, J. H.; Deo, J.; Liu, Y.; Mattinson, A.; Leon-Martinez, G.; Escobar-Guzman, R.; Salinas-Gamboa, R.; Amasende-Morales, I.; Vielle-Calzada, J.-P.; Koltunow, A. M. G.; Ferguson, B. J.

2026-07-09 plant biology 10.64898/2026.06.30.735734 medRxiv

Top 0.4%

3.1%

Show abstract

Legumes include some of the worlds most significant crop species, such as cowpea (Vigna unguiculata), a subsistence crop widely grown in sub-Saharan Africa. Despite their importance, legume crop improvement is hindered by a lack of high-resolution expression data, particularly for reproductive tissues and cell types. Here, we report on VigExp, a tool for visualising cowpea gene expression datasets. We demonstrate its utility across a range of vegetative and reproductive cell types of varieties IT97K-499-35 and IT86D-1010, which exhibit 93.75% protein sequence conservation and are amenable to stable transformation. This includes previously published transcriptomes of vegetative, floral and seed tissues, combined with developmentally staged male and female reproductive tissues. Also integrated are novel transcriptomes of laser-captured cell types covering reproductive development from meiosis to early embryo formation post-fertilisation. Spatial expression patterns and transcript levels can be visualised through an electronic fluorescent pictograph (eFP) browser. Validated by RT-qPCR, in situ hybridisation, transgenic, and CRISPR gene editing analyses, the predictive accuracy of VigExp matches prior cowpea functional study observations. Critical genes for nodule development and regulation were also identified and their expression patterns established in cowpea. Novel reference genes, constitutively expressed gene promoters for visualization makers/gene-editing, and tissue and cell specific gene promoters for targeting these regions, are identified. The A-type cyclin, VuTAM2, was also identified, with a critical role in male meiosis established. Collectively, VigExp represents an adaptable and updatable resource to support crop improvement in cowpea and other legumes, which are often highly syntenic with respect to genome composition.

17

Transgressive gene expression and methylation remodeling in an intraspecific hexaploid wheat hybrid

Ardaman, A.; Forgiarini, C.; Arunkumar, R.

2026-07-09 plant biology 10.64898/2026.06.29.735383 medRxiv

Top 0.4%

3.1%

Show abstract

Intraspecific hybridization in allopolyploid plant genomes has the potential to induce non-additive changes in gene expression and DNA cytosine methylation, partly through interactions among divergent parental subgenomes. However, the extent to which intraspecific hybridization reshapes gene expression, coordinates homoeolog regulation, and remodels methylation in higher-order polyploids remains poorly quantified. To address this, we sequenced seedling leaf transcriptomes and methylomes from two parental cultivars of hexaploid bread wheat (Triticum aestivum L.) and their hybrids. More than 40% of genes were differentially expressed between hybrids and parents, although many were not differentially expressed between the parents themselves, consistent with complex trans-regulatory effects in the hybrid genome. This effect was more pronounced for homoeologs whose relative expression differed between the parents. These expression shifts often occurred simultaneously across all three homoeologs within triads, reducing homoeolog expression bias (HEB) in the hybrids. CG methylation levels were similar between the parents and hybrids in regions of low genetic divergence and in transposable element (TE)-rich regions, whereas CG sites in gene-rich regions showed more additive inheritance (hybrids intermediate between parents), particularly when parental haplotypes were themselves divergent. TE and gene body methylation (gbM) was strongly conserved in parents and hybrids. gbM was associated with more balanced homoeolog expression and fewer non-additive expression changes. CHH methylation showed overdominance, whereas non-conserved CHG methylation was enriched in TE-rich regions, suggesting that non-CG remodeling may reflect parental differences in TE and small-RNA content. Our results show that intraspecific hybridization within a hexaploid species can generate non-additive changes in gene expression and DNA methylation in seedling leaf tissue, while the presence of homoeologous genes, parental HEB, parental genetic and methylation divergence, and genomic location have varying levels of influence on expression or methylation remodeling.

18

An in vitro regeneration system with efficient rooting in sweet orange (Citrus sinensis) supports recovery of transgenic plants

Datta, J.; Bhowmik, S. D.; Williams, B.; Kerr, S. C.

2026-07-08 plant biology 10.64898/2026.06.16.732047 medRxiv

Top 0.4%

2.6%

Show abstract

In vitro regeneration of Citrus plants is a widely used method, however, induction of adventitious roots from regenerated shoots remains a major bottleneck, limiting the recovery of healthy plants for commercial production and genomic research for crop improvement. We established an in vitro regeneration system producing profuse, healthy roots for sweet orange (Citrus sinensis cv. Benyenda) by optimising combinations and concentrations of auxins. Prior to optimising the rooting media (RTMs), we obtained a shoot regeneration rate of 90.6% from sweet orange epicotyl explants using a cytokinin, 6-benzylaminopurine (BAP). Across twelve auxin-supplemented RTMs containing different concentrations of indole-3-butyric acid (IBA) and/or 1-naphthaleneacetic acid (NAA), rooting percentages ranged from 8 - 87.5%. The combination of IBA 1.0 mg L-1 and NAA 0.1 mg L-1 promoted the best overall performance, 75 {+/-} 7.2% rooting percentage with healthy, callus-free roots ([≥]5 cm in length), whereas other RTMs with other auxin combinations induced callus and limited root elongation. The best-performing SRM and RTM were subsequently used for selection and recovery of transgenic sweet orange lines carrying an empty CRISPR/Cas9 construct, resulting in an 4.8% transformation efficiency. Both transgenic and non-transgenic rooted plantlets were successfully acclimatised under glasshouse conditions with a survival rate of 90%. This enhanced regeneration system overcomes rooting bottleneck and improves plant survival,enabling faster recovery of transgenic citrus lines within four months. It supports accelerated development for commercial applications and advances in citrus genetic improvement.

19

A first pangenomic framework for globe artichoke supports SNP-based varietal fingerprinting

Portis, E.;Vergnano, E.;Gaccione, L.;Acquadro, A.;Comino, C.;Carli, C.;Barchi, L.;Martina, M.

2026-06-26 Plant Biology 10.64898/2026.06.25.734495 medRxiv

Top 0.4%

2.4%

Show abstract

Globe artichoke (Cynara cardunculus var. scolymus L.) comprises a broad range of local ecotypes and varietal groups whose genetic diversity has been investigated through different molecular markers. However, recent advances in next-generation sequencing and pangenomics approaches provide new opportunities to capture genome-wide variation at higher resolution and to develop practical tools for varietal discrimination, traceability, and germplasm conservation. In this study, we developed the first pangenomic framework for cultivated artichoke and evaluated pangenome-informed SNP markers for varietal fingerprinting. Whole-genome resequencing data from the Italian local ecotype Asti Sori were integrated with publicly available genomic data from representative globe artichoke and cultivated cardoon accessions to construct and annotate a pangenome. Genome-wide SNP and presence/absence variation (PAV) analyses were combined with pangenome-anchored genotyping-by-sequencing (GBS) data from 45 accessions representing the main cultivated varietal groups. The pangenome revealed a largely conserved core gene repertoire alongside a smaller accessory component, with gene accumulation curves suggesting a tendency toward saturation within the sampled cultivated germplasm. SNP- and PAV-based analyses provided complementary views of accession relationships and consistently resolved the principal cultivated groups. Across the broader germplasm panel, pangenome-anchored GBS-derived SNPs identified well-supported phylogenetic clusters corresponding to recognized varietal types. A reduced panel of 50 SNPs, selected through iterative random subsampling, retained at least 90% of the genetic diversity captured by the full dataset and reproduced its main population structure. This compact pangenome-anchored marker set provides a practical foundation for varietal fingerprinting, DUS-oriented applications, traceability, and conservation of traditional globe artichoke germplasm. Validation across independent collections will be required before routine deployment.

20

Identification of Seed Metabolites and Microbiota members associated with Germination and Emergence in Common Bean

Colaert-Sentenac, L.; Planchet, E.; Abadie, C.; Lalande, J.; Hamdy, S.; Marais, C.; Dupont, A.; Le Corre, L.; Koutouan, C.-E.; Wagner, M.-H.; Barret, M.; Tcherkez, G.; Teulat, B.; Simonin, M.

2026-07-08 plant biology 10.64898/2026.06.16.732447 medRxiv

Top 0.5%

2.1%

Show abstract

Seed quality is a complex trait shaped by morphological, biochemical and microbiological properties that are rarely characterised simultaneously, limiting our ability to identify robust predictive indicators of germination speed and seedling emergence across varieties. Here, we performed a multi-factor characterisation of eight common bean (Phaseolus vulgaris L.) varieties, combining seed morphometrics, untargeted GC-MS metabolomics on three seed organs, and amplicon sequencing of bacterial and fungal communities, to identify indicators of germination speed and emergence percentage. The eight varieties showed substantial variation in both traits, used as physiological seed quality proxies. Seed weight and size variation between varieties were correlated with germination speed. The intravariety variance of seed weight was independently correlated with emergence performance. Metabolome composition differed strongly across seed organs, with variety as the dominant driver. Individual-seed metabolomic profiles in the plumule and cotyledon were associated with germination speed but not emergence, yielding 16 plumule and three cotyledon candidate metabolite markers. Fungal community composition was associated with both germination speed and emergence, while bacterial communities were associated with emergence only. Nine fungal and four bacterial taxa were identified as candidate indicators. Inter-kingdom co-occurrence network analysis revealed that fungi with similar germination speed associations tend to cluster in the same modules, suggesting that community-level modules rather than individual taxa may constitute more robust microbial indicators. These results demonstrate that germination speed and emergence capacity are governed by distinct seed properties, and provide morphological, metabolic and microbial candidate indicators for integration into targeted seed quality assessment frameworks for common bean.